{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "WfGpInivS0fG"
   },
   "source": [
    "<h2 align=\"center\">点击下列图标在线运行HanLP</h2>\n",
    "<div align=\"center\">\n",
    "\t<a href=\"https://colab.research.google.com/github/hankcs/HanLP/blob/doc-zh/plugins/hanlp_demo/hanlp_demo/zh/srl_mtl.ipynb\" target=\"_blank\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>\n",
    "\t<a href=\"https://mybinder.org/v2/gh/hankcs/HanLP/doc-zh?filepath=plugins%2Fhanlp_demo%2Fhanlp_demo%2Fzh%2Fsrl_mtl.ipynb\" target=\"_blank\"><img src=\"https://mybinder.org/badge_logo.svg\" alt=\"Open In Binder\"/></a>\n",
    "</div>\n",
    "\n",
    "## 安装"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "IYwV-UkNNzFp"
   },
   "source": [
    "无论是Windows、Linux还是macOS，HanLP的安装只需一句话搞定："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "1Uf_u7ddMhUt"
   },
   "outputs": [],
   "source": [
    "!pip install hanlp -U"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "pp-1KqEOOJ4t"
   },
   "source": [
    "## 加载模型\n",
    "HanLP的工作流程是先加载模型，模型的标示符存储在`hanlp.pretrained`这个包中，按照NLP任务归类。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "id": "0tmKBu7sNAXX"
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'CPB3_SRL_ELECTRA_SMALL': 'https://file.hankcs.com/hanlp/srl/cpb3_electra_small_crf_has_transform_20220218_135910.zip'}"
      ]
     },
     "execution_count": 1,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import hanlp\n",
    "hanlp.pretrained.srl.ALL # 语种见名称最后一个字段或相应语料库"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "EmZDmLn9aGxG"
   },
   "source": [
    "调用`hanlp.load`进行加载，模型会自动下载到本地缓存："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "srl = hanlp.load('CPB3_SRL_ELECTRA_SMALL')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "elA_UyssOut_"
   },
   "source": [
    "## 语义角色分析\n",
    "为已分词的句子执行语义角色分析："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 70
    },
    "id": "BqEmDMGGOtk3",
    "outputId": "2a0d392f-b99a-4a18-fc7f-754e2abe2e34"
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[[('2021年', 'ARGM-TMP', 0, 1),\n",
       "  ('HanLPv2.1', 'ARG0', 1, 2),\n",
       "  ('为生产环境', 'ARG2', 2, 5),\n",
       "  ('带来', 'PRED', 5, 6),\n",
       "  ('次世代最先进的多语种NLP技术', 'ARG1', 6, 15)],\n",
       " [('次世代', 'ARGM-TMP', 6, 8),\n",
       "  ('最', 'ARGM-ADV', 8, 9),\n",
       "  ('先进', 'PRED', 9, 10),\n",
       "  ('技术', 'ARG0', 14, 15)]]"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "srl(['2021年', 'HanLPv2.1', '为', '生产', '环境', '带来', '次', '世代', '最', '先进', '的', '多', '语种', 'NLP', '技术', '。'])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "语义角色标注结果中每个四元组的格式为`[论元或谓词, 语义角色标签, 起始下标, 终止下标]`。其中，谓词的语义角色标签为`PRED`，起止下标对应单词数组。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "遍历谓词论元结构："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "第1个谓词论元结构：\n",
      "2021年 = ARGM-TMP at [0, 1]\n",
      "HanLPv2.1 = ARG0 at [1, 2]\n",
      "为生产环境 = ARG2 at [2, 5]\n",
      "带来 = PRED at [5, 6]\n",
      "次世代最先进的多语种NLP技术 = ARG1 at [6, 15]\n",
      "第2个谓词论元结构：\n",
      "次世代 = ARGM-TMP at [6, 8]\n",
      "最 = ARGM-ADV at [8, 9]\n",
      "先进 = PRED at [9, 10]\n",
      "技术 = ARG0 at [14, 15]\n"
     ]
    }
   ],
   "source": [
    "for i, pas in enumerate(srl(['2021年', 'HanLPv2.1', '为', '生产', '环境', '带来', '次', '世代', '最', '先进', '的', '多', '语种', 'NLP', '技术', '。'])):\n",
    "    print(f'第{i+1}个谓词论元结构：')\n",
    "    for form, role, begin, end in pas:\n",
    "        print(f'{form} = {role} at [{begin}, {end}]')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 注意\n",
    "Native API的输入单位限定为句子，需使用[多语种分句模型](https://github.com/hankcs/HanLP/blob/master/plugins/hanlp_demo/hanlp_demo/sent_split.py)或[基于规则的分句函数](https://github.com/hankcs/HanLP/blob/master/hanlp/utils/rules.py#L19)先行分句。RESTful同时支持全文、句子、已分词的句子。除此之外，RESTful和native两种API的语义设计完全一致，用户可以无缝互换。"
   ]
  }
 ],
 "metadata": {
  "accelerator": "GPU",
  "colab": {
   "collapsed_sections": [],
   "name": "srl_mtl.ipynb",
   "provenance": []
  },
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.12"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}
